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—, Noise and Probablistic target 
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briefly introduced noise before pocket algorithm 













NTD 1,000,000 noise In y: good customer, 
mislabeled’ as bad? 


noise in y: same customers, 


| currentdebt | debt | 200000 | diff be? 
credit? (no(—1), yes(--1)] ifferent labels: 

e noise in x: inaccurate 
customer information? 





does VC bound work under noise? | 
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Probabilistic Marbles 


mple 
one key of VC bound: marbles! B ` 





‘deterministic’ marbles 
* marble x ~ P(x) 
« deterministic color 


[f(x) 4 h(x)] 


‘probabilistic’ (noisy) marbles 
e marble x ~ P(x) 


e probabilistic color 
[y Å h(x)] with y ~ P(y|x) 











same nature: can estimate P[orange] if pis 


VC holds for x "^: P(x), y "4% P(y|x) 
SS aLa 






(xy) P(x,y) 


P(y|æ)RaABirgdt (Target Distribution) . ESA LSI RASH 
A, Alas noise, BSL, i£ &noisefiy idis AA Bae KØK AP (y|æ EE 
HÅ, BUREN 1700 WFLA EMAC : 


P(ylz) = 1, for y = f(z) 


P(ylz) = 0, for y # f(z) 


Target Distribution P(y|x) 


characterizes behavior of ‘mini-target’ on one x 


e can be viewed as ‘ideal mini-target’ + noise, e.g. 
e P(o|x) = 0.7, P(x|x) = 0.3 
e ideal mini-target f(x) = o 
e ‘flipping’ noise level = 0.3 
e deterministic target f: special case of target distribution 
e P(y|x) = 1 for y = f(x) 
e P(y|x) = 0 for y + f(x) 


goal of learning: 


predict ideal mini-target (w.r.t. P(y|x)) 
on often-seen inputs (w.r.t. P(x)) 
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L4 ^ ^ 

t - * ^ 1 
] X1.X2.- Qe v, Æ å 

LJ Fe we iy ` 
' ` 






learning 
algorithm 
A 










training examples 
D: (X4. 1). x 2d (Xw; YN 


hypothesis set 
H 


VC still works, pocket algorithm explained :-) | 


—, ERROR Measure 





final hypothesis 
gf 
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RÆÐIST, BAARN EA ENE? 


RIMAE nR ENE: 


« out-of-sample: HENIE 
* pointwise: MÆRE raHT ilis 


e classification: prediction S5target24—H, classification errori? Fr 
0/1 error 


e how well? previously, considered out-of-sample measure 
Eou(g) = E lox) 7 f] 


e more generally, error measure E(9, f) 

e naturally considered 
e out-of-sample: averaged over unknown x 
e pointwise: evaluated on one x 
e classification: [prediction 7 target] 


PointWise error £ ir FAN 248 SERT Rl Seta, EinfllEoutBS 
pointwise errorB RAJ: 





out-of-sample 






N 
Ein(9) = 4 Y em(g(xn), fóxn)) | Eolo) = E erlat), f(x) 
n=1 


pointwise errorælaågITP&KERHERH AN FERGE Ii, ARRET, PX 
(TEES EXT. pointwise error—ÅXAJLISTAKMEE: 0/1 errorfüsquared error, 
0/1 errori HEN (classification) [MÆ E, Msquared errori F AEE 
(regression) JS] E, 







squared error 






err(y,y)= [Y 4 y] 
e correct or incorrect? 
e often for classification 





err( y. y) = (y - y)? 
* how far is y from y? 
e often for regression 


Ideal Mini-TargetH P(y|x)#lerrttfa RE, 0/1 errorffisquared errorfiIdeal Mini- 


Targetit & FAT tÐ. (IRI RIX NAF, 795/Æ30/1 errorflsquared error (hit & 
iiBümini-target&&Z/P, 0/1 errorrhifmini-target£& BXP(y|x) RAHI, Tf 
squared errorHÅYymini-target SAAT KINN FF. 


Ideal Mini-Target 
interplay between noise and error: 


P(y|x) and err define ideal mini-target f(x) 


P(y = 1|x) = 0.2, P(y = 2\x) = 0.7, Ply = 31x) = 0.1 


en(y. y) = Vy] em(Ýý,y) = (y - y)? 
1 avg. err 0.8 1 avg. err 1.1 
~ ] 2 avg. r 0.3(*) 2 avg. err 0.3 
es avg. err 0.9 3 avg. err 1.5 
1.9 avg. err 1.0(really? :-)) 1.9 avg. err 0.29(x) 
f(x) — m P(y Ix) f(x) = 5 y - P(ylx) 
y 


yey 
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B94Bg, MinteSo5 Eta ÐA. FLL, SlAerror measures, SJF 
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unknown target 
distribution P(y|x) 
containing f(x) + noise 























| training — Jed final hypothesis 


error measure 
err 


=, Algorithmic Error Measure 


ErrorB it: false acceptfifalse reject, false accepts HERMAN, 
false reject =U EXA. RETRAIN = JA, false acceptfifalse 
rejectNM AT ARNE, KREINER, beeline, APA false reject 
MARNA EB; URERA, APAfalse acceptNiZisiIA—E. 


two types of error: false accept and false reject 





+1 no error false reject 
-1 | false accept no error 


NF IRA ZABScost function errori t 824774, ÅskfyerÅRELITE, Æ 
HD KAL AHplausibleekgfriendly, BAA XM. 


Algorithmic Error Measures err 


e true: just err 
e plausible: 
e 0/1: minimum ‘flipping noise —NP-hard to optimize, remember? :-) 
e squared: minimum Gaussian noise 
e friendly: easy to optimize for A 
* closed-form solution 
e convex objective function 


8| Aalgorithm error measureZA, FEET: 
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learning 

algorithm 

A mm 
err 


training examples 
D: (Xy yi). - + (XN; YN 


hypothesis set 
H 


Pu. Weighted Classification 













SOBRE, MEŽI Cost FunctionBD3k Fixerror, HØRES EIER CBS ir 
ERA, mkt Error (Ein) FEE, 

cost function}, false accept#[false reject FAN, ERRA PAN., XA 
BEARR, ALI virtual copyingÅIFi&. 


Systematic Route: Connect EW and EX 


original problem equivalent problem 





h(x) h(x) 
+1 - +1 1 
+1 0 1 +1 0 1 
Y 414000 0 ZO Wed. Æ 
(x, +1) (x1, +1) 
(X2, —1) (Xo, —1), (Xo, —1), AXI (Xo, —1) 
p. (9-1) Ge TIE ED D) 
(Xn-1;+1) (Xn-1;+1) 
(Xn, +1) (Xn,+1) 


after copying —1 examples 1000 times, 
EN for LHS = E% 1 for RHS! | 


Weighted Pocket Algorithm 








using ‘virtual copying’, weighted pocket algorithm include: 


e weighted PLA: 
randomly check —1 example mistakes with 1000 times more 
probability 


e weighted pocket replacement: 
if w;,.4 reaches smaller E? than Ww, replace W by Ww; 


systematic route (called reduction"): 
can be applied to many other algorithms! 
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